fix: definition coverage gaps in words hub and 5 small languages#135
fix: definition coverage gaps in words hub and 5 small languages#135
Conversation
The words hub (/<lang>/words) and individual word pages (/<lang>/word/<id>) only read from the disk cache for definitions, missing the 206K pre-built kaikki definitions that exist in memory. This caused most words on the words hub to show no definition (e.g. Spanish had 3/7405 cached). Now both routes fall back to kaikki native → kaikki English lookups on cache miss. These are in-memory dict lookups with zero latency. Also adds 5 zero-coverage languages (pau, ie, rw, tlh, qya) to the LLM fallback allowlist — they had no kaikki files, UNRELIABLE parser confidence, AND were missing from the LLM allowlist, so every definition request returned None.
|
@coderabbitai review |
📝 WalkthroughWalkthroughThe changes implement a disk-cache-first fallback strategy for definition lookups across multiple functions in the application, importing new Wiktionary lookup functions for native and English language definitions. Additionally, five new language codes are added to the language names mapping to support Palauan, Interlingue, Kinyarwanda, Klingon, and Quenya. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
webapp/app.py (1)
1418-1431:⚠️ Potential issue | 🟡 MinorInconsistent handling of negative cache entries compared to
language_words_hub.In
language_words_hub(line 1174-1175), the code checksloaded and loaded.get("definition")before assigning, which correctly skips negative cache entries like{"not_found": True, "ts": ...}.Here in
word_page, the assignmentdefinition = loaded if loaded else Nonewould treat a negative cache entry as a valid definition (sinceloadedis truthy), causing the kaikki fallback to be skipped.Proposed fix to align with language_words_hub logic
if os.path.exists(cache_path): try: with open(cache_path, "r") as f: loaded = json.load(f) - definition = loaded if loaded else None + if loaded and loaded.get("definition"): + definition = loaded except Exception: pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webapp/app.py` around lines 1418 - 1431, The code in word_page incorrectly treats any loaded cache JSON as a valid definition, causing negative cache entries to short-circuit kaikki lookups; update the cache-load branch in word_page to mirror language_words_hub by assigning definition only when loaded and loaded.get("definition") is truthy (i.e., change the logic that sets definition = loaded if loaded else None to require loaded.get("definition")), so negative entries like {"not_found": true, "ts": ...} are skipped and lookup_kaikki_native / lookup_kaikki_english still run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@webapp/app.py`:
- Around line 1418-1431: The code in word_page incorrectly treats any loaded
cache JSON as a valid definition, causing negative cache entries to
short-circuit kaikki lookups; update the cache-load branch in word_page to
mirror language_words_hub by assigning definition only when loaded and
loaded.get("definition") is truthy (i.e., change the logic that sets definition
= loaded if loaded else None to require loaded.get("definition")), so negative
entries like {"not_found": true, "ts": ...} are skipped and lookup_kaikki_native
/ lookup_kaikki_english still run.
📝 WalkthroughWalkthroughThe pull request enhances definition retrieval by implementing a two-step fallback mechanism for missing cached definitions, attempting native language lookups first before falling back to English equivalents. Additionally, five new languages (Palauan, Interlingue, Kinyarwanda, Klingon, Quenya) are registered in the language-name mapping system. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
webapp/app.py (1)
1418-1431: The fallback logic is correctly duplicated fromlanguage_words_hub.The implementation is consistent with the hub page, which is good.
Consider extracting the shared definition-loading logic (disk cache + kaikki fallbacks) into a helper function to reduce duplication. This would consolidate the pattern used in both
language_words_hubandword_page.♻️ Optional: Extract shared definition loading to a helper
def _load_definition_with_fallback(word, lang_code): """Load definition from disk cache, falling back to kaikki pre-built dicts.""" cache_path = os.path.join(WORD_DEFS_DIR, lang_code, f"{word.lower()}.json") if os.path.exists(cache_path): try: with open(cache_path, "r") as f: loaded = json.load(f) if loaded and loaded.get("definition"): return loaded except Exception: pass definition = lookup_kaikki_native(word, lang_code) if not definition: definition = lookup_kaikki_english(word, lang_code) return definitionThen use
definition = _load_definition_with_fallback(word, lang_code)in both routes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webapp/app.py` around lines 1418 - 1431, Extract the duplicated disk-cache + kaikki fallback logic into a helper function (e.g. _load_definition_with_fallback(word, lang_code)) that: checks the cache path under WORD_DEFS_DIR/lang_code/word.lower() and attempts json.load inside a try/except, returns the loaded dict if valid (or None on error), otherwise calls lookup_kaikki_native(word, lang_code) and then lookup_kaikki_english(word, lang_code) as fallbacks and returns the first non-None result; then replace the duplicated blocks in word_page and language_words_hub with a single call to this helper to obtain definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@webapp/app.py`:
- Around line 1418-1431: Extract the duplicated disk-cache + kaikki fallback
logic into a helper function (e.g. _load_definition_with_fallback(word,
lang_code)) that: checks the cache path under
WORD_DEFS_DIR/lang_code/word.lower() and attempts json.load inside a try/except,
returns the loaded dict if valid (or None on error), otherwise calls
lookup_kaikki_native(word, lang_code) and then lookup_kaikki_english(word,
lang_code) as fallbacks and returns the first non-None result; then replace the
duplicated blocks in word_page and language_words_hub with a single call to this
helper to obtain definition.
Summary
/<lang>/words,/<lang>/word/<id>) only read definitions from the disk cache, completely missing the 206K pre-built kaikki definitions that are already loaded in memory. This caused most words on the words hub to show no definition — e.g. Spanish showed definitions for only 3 out of 7,405 words.Changes
app.py: Bothlanguage_words_hub()andword_page()now fall back tolookup_kaikki_native()→lookup_kaikki_english()on disk cache miss. These are in-memory dict lookups (zero latency, no network calls).wiktionary.py: Addedpau,ie,rw,tlh,qyatoLLM_LANG_NAMESso they can use the gpt-4o-mini fallback.Test plan
/es/words— definitions should now appear for all words with kaikki entries/pau/words— LLM definitions should be fetchable via individual word pages/en/word/<id>— definition should server-render without needing JS fallback fetchSummary by CodeRabbit